Week Four for the Housing Team
AI-Driven Housing Evaluation for Rural Community Development
This blog outlines where the Housing Team is at after week four of DSPG. We have made a lot of progress in the past four weeks, and we are excited to share it with you!
Project Overview
This is the project plan we came up with the first week of DSPG. This project is intended to span over three years with DPSG, and different interns will be working on it in the coming years. Thus, the project plan is ambitious for this summer.
Problem Statement
The absence of a comprehensive and unbiased assessment of housing quality in rural communities poses challenges in identifying financing gaps and effectively allocating resources for housing improvement. Consequently, this hinders the overall well-being and health of residents, impacts workforce stability, diminishes rural vitality, and undermines the economic growth of Iowa. Moreover, the subjective nature of evaluating existing housing conditions and the limited availability of resources for thorough investigations further compound the problem. To address these challenges, there is a pressing need for an AI-driven approach that can provide a more accurate and objective evaluation of housing quality, identify financing gaps, and optimize the allocation of local, state, and federal funds to maximize community benefits.
Utilizing web scraping techniques to collect images of houses from various assessor websites, an AI model can be developed to analyze and categorize housing features into good or poor quality. This can enable targeted investment strategies. It allows for the identification of houses in need of improvement and determines the areas where financial resources should be directed. By leveraging AI technology in this manner, the project seeks to streamline the housing evaluation process, eliminate subjective biases, and facilitate informed decision-making for housing investment and development initiatives in rural communities
Goals and Objectives
Image Gathering
Zillow
Realtors.com
County Assessor Pages
Vangaurd: Independence
Beacon Schneider: Slater, New Hampton, and Grundy Center
Build, Train, and Test AI Models
Vegetation Model
Siding Model
Gutter Model
Create Database of Housing Information
Zillow
Realtors.com
County Assessor Pages
Vanguard: Independence
Beacon Schneider: Slater, New Hampton, and Grundy Center
Our Progress
We have been making good progress to complete the goals and objectives we outlined above. Since the beginning of the Data Science for the Public Good Program, we have been expanding our knowledge of data science, particularly in areas that relate to this housing project. We have been learning and covering new concepts through Data Camp. We have also watched two webinars on TidyCensus training, as well as started creating AI Models to practice with.
Data Camp Training:
GitHub Concepts
AI Fundamentals
Introduction to R
Intermediate R
Introduction to the Tidyverse
Web Scraping in R
Introduction to Deep Learning with Keras
TidyCensus Demographic Data Collection:
One of the first steps in our project was to explore the available demographic data in our selected cities and counties. We thought it valuable to understand the demographic data, and we have represented in the plots below:
Creating Test AI Models:
The next step was creating an AI Model. We decided to create an AI Model early in the project before finishing the housing data collection so that we had a better understanding when it came to putting everything together. The AI Model below tests for vegetation in front of houses.
This Week:
In-Person Data Collection
On Tuesday this week, the entire DSPG program went to Slater to practice data collection in person. The housing group took this as an opportunity to collect some housing photos on the ground to use in our AI Model later on.
Google Street View and URLs
We are getting the majority of our photos for the AI to use from Google Street View. Google has an API key that you can use to generate an image for a specific address. We spent the first half of this week pulling addresses from each of our cities and creating URLs to pull the images from Google Street View.
We ran into a couple of problems when doing this, the biggest of which is displayed in the images below. Because we are working with cities in rural areas, there is not Google Street View images available for every street in our cities.
Below is a sample from the tables we created containing the URLs to grab the images from Google Street View.
Web Scraping
Once we were finished collecting addresses and generating URLs, we moved on to scraping the web for more images. We decided to grab images from Zillow, Realtors.com, and the County Assessor pages for our cities. We were able to successfully scrape images from Zillow this week.
When web scraping, we ran into a problem with blurred houses. Upon some research, we found out that some home owners pay Google to have their home blurred for Google Street View.
The next websites we are scraping hold the Iowa Assessors housing data for our four cities. We found a webpage that has links to every counties assessor website. The key at the bottom shows where the data is held. Yellow means the data is held by Vanguard. Blue means the data is online. For most of Iowa’s counties, this means that it is held by Beacon Schneider.
Web Scraping Issues
Independence had issues where the house number was listed as 100/101 and also had 100 1/2. Thankfully the function to grab the Google images ran all the way but said there were 50 errors including both addresses with two house numbers and with half signs. If you remove one of the address numbers or remove the 1/2 (basically removing the / sign) the image url still brings you to a Google image. We could possibly go back through and grab these URL’s and alter them to try and grab these addresses if necessary.
Happies
had a great meeting with Erin Olson-Douglas
finished collecting and creating URL addresses for Google Street View images
Zillow owns Trulia so we don’t have to web scrape both sites :)
able to successfully scrape some things from Zillow !
Crappies
Web Scraping
Beacon and Vanguard have anti-web scraping protections
Angelina’s Excel is stupid
Future Plans and Next Steps
Once we are able to scrape enough images off of Zillow, Realtors.com, and the assessor pages, we will be able to move on with creating AI Models. The diagram below outlines how the AI Models will work in the next steps of out project.